An eco-informatics tool for microbial community studies: supervised classification of Amplicon Length Heterogeneity (ALH) profiles of 16S rRNA.

نویسندگان

  • Chengyong Yang
  • DeEtta Mills
  • Kalai Mathee
  • Yong Wang
  • Krish Jayachandran
  • Masoumeh Sikaroodi
  • Patrick Gillevet
  • Jim Entry
  • Giri Narasimhan
چکیده

Support vector machines (SVM) and K-nearest neighbors (KNN) are two computational machine learning tools that perform supervised classification. This paper presents a novel application of such supervised analytical tools for microbial community profiling and to distinguish patterning among ecosystems. Amplicon length heterogeneity (ALH) profiles from several hypervariable regions of 16S rRNA gene of eubacterial communities from Idaho agricultural soil samples and from Chesapeake Bay marsh sediments were separately analyzed. The profiles from all available hypervariable regions were concatenated to obtain a combined profile, which was then provided to the SVM and KNN classifiers. Each profile was labeled with information about the location or time of its sampling. We hypothesized that after a learning phase using feature vectors from labeled ALH profiles, both these classifiers would have the capacity to predict the labels of previously unseen samples. The resulting classifiers were able to predict the labels of the Idaho soil samples with high accuracy. The classifiers were less accurate for the classification of the Chesapeake Bay sediments suggesting greater similarity within the Bay's microbial community patterns in the sampled sites. The profiles obtained from the V1+V2 region were more informative than that obtained from any other single region. However, combining them with profiles from the V1 region (with or without the profiles from the V3 region) resulted in the most accurate classification of the samples. The addition of profiles from the V 9 region appeared to confound the classifiers. Our results show that SVM and KNN classifiers can be effectively applied to distinguish between eubacterial community patterns from different ecosystems based only on their ALH profiles.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CopyRighter: a rapid tool for improving the accuracy of microbial community profiles through lineage-specific gene copy number correction

BACKGROUND Culture-independent molecular surveys targeting conserved marker genes, most notably 16S rRNA, to assess microbial diversity remain semi-quantitative due to variations in the number of gene copies between species. RESULTS Based on 2,900 sequenced reference genomes, we show that 16S rRNA gene copy number (GCN) is strongly linked to microbial phylogenetic taxonomy, potentially under-...

متن کامل

RiboFR-Seq: a novel approach to linking 16S rRNA amplicon profiles to metagenomes

16S rRNA amplicon analysis and shotgun metagenome sequencing are two main culture-independent strategies to explore the genetic landscape of various microbial communities. Recently, numerous studies have employed these two approaches together, but downstream data analyses were performed separately, which always generated incongruent or conflict signals on both taxonomic and functional classific...

متن کامل

fEMS ikter An assessment of the hypervariable domains of the16S rRNA genes for their value in determining microbial community diversity: the paradox of traditional ecological indices

Amplicon length heterogeneity PCR (LH-PCR) was investigated for its ability to distinguish between microbial community patterns from the same soil type under different land management practices. Natural sagebrush and irrigated mouldboard-ploughed soils from Idaho were queried as to which hypervariable domains, or combinations of 16S rRNA gene domains, were the best molecular markers. Using stan...

متن کامل

Analysis of the mouse gut microbiome using full-length 16S rRNA amplicon sequencing

Demands for faster and more accurate methods to analyze microbial communities from natural and clinical samples have been increasing in the medical and healthcare industry. Recent advances in next-generation sequencing technologies have facilitated the elucidation of the microbial community composition with higher accuracy and greater throughput than was previously achievable; however, the shor...

متن کامل

Analysis of 16S rRNA Amplicon Sequencing Options on the Roche/454 Next-Generation Titanium Sequencing Platform

BACKGROUND 16S rRNA gene pyrosequencing approach has revolutionized studies in microbial ecology. While primer selection and short read length can affect the resulting microbial community profile, little is known about the influence of pyrosequencing methods on the sequencing throughput and the outcome of microbial community analyses. The aim of this study is to compare differences in output, e...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of microbiological methods

دوره 65 1  شماره 

صفحات  -

تاریخ انتشار 2006